Measuring the Structural Similarity between Source Code Entities (S)
نویسندگان
چکیده
Similarity coefficients are widely used in software engineering for several purposes, such as identification of refactoring opportunities and system remodularizations. Although the literature provides several similarity coefficients that vary on the computing strategy, there is a tendency among researchers to make habitual use of certain coefficients that others in their field are using. Consequently, some approaches might be using an inadequate coefficient for their purpose. In this paper, we conduct a quantitative study that compares 18 coefficients to identify which one is the most appropriate in determining where a class should be located. Our evaluation contemplates 111 open source systems from Qualitas Corpus, which totalizes more than 70,000 classes. As a result, we observed that Jaccard—one of the most used coefficients in our area—has not presented the best results. While Jaccard correctly indicated the suitable module to 22% of the classes, other coefficients were able to indicate 60%.
منابع مشابه
Hapax - Enriching Reverse Engineering with Semantic Clustering
Many reverse engineering approaches focus on structural information and ignore semantic information like the naming of identifiers or comments. But developers put their domain knowledge into exactly these parts of the source code. Without understanding the semantics of the code, one cannot tell its meaning. We use Latent Semantic Indexing, an information retrieval technique [3], to retrieve the...
متن کاملA Source Code Similarity System for Plagiarism Detection
Source code plagiarism is an easy to do task, but very difficult to detect without proper tool support. Various source code similarity detection systems have been developed to help detect source code plagiarism. Those systems need to recognize a number of lexical and structural source code modifications. For example, by some structural modifications (e.g. modification of control structures, mod...
متن کاملMeasuring Semantic Similarity using a Multi-Tree Model
Recommender systems and search engines are examples of systems that have used techniques such as Pearson’s product-momentum correlation coefficient or Cosine similarity for measuring semantic similarity between two entities. These methods relinquish semantic relations between pairs of features in the vector representation of an entity. This paper describes a new technique for calculating semant...
متن کاملAlignment-free local structural search by writhe decomposition
MOTIVATION Rapid methods for protein structure search enable biological discoveries based on flexibly defined structural similarity, unleashing the power of the ever greater number of solved protein structures. Projection methods show promise for the development of fast structural database search solutions. Projection methods map a structure to a point in a high-dimensional space and compare tw...
متن کاملMeasuring Similarity of Large Software Systems Based on Source Code Correspondence
It is an important and intriguing issue to know the quantitative similarity of large software systems. In this paper, a similarity metric between two sets of source code files based on the correspondence of overall source code lines is proposed. A Software similarity MeAsurement Tool SMAT was developed and applied to various versions of an operating system(BSD UNIX OS). The resulting similarity...
متن کامل